CAMPUS RECRUITMENT ANALYSIS

Dataset Overview

This dataset consists of MBA campus Recruitment data, collected to analyze the factors that affect the recruitment. It was retrieved from Kaggle and contains the academic history of students who sat for campus placements in the university. The information collected has data of 215 MBA students. Below is the snippet of the dataset:

Objective

  • Which MBA Majors had highest recruitment?
  • If the work experiences added value for placement?
  • Variation in the salary.
  • Required minimum dataset to closely align with population.
  • Predicting placement status based on the academic history of the students.

Pre-processing

Data pre-processing is done by removing any empty values in the dataset, except for the Salary column that has a conditional pre-processing of detecting the missing values only if the placement status is Placed, else it is considered a good data. A new attribute ‘Grades’ is added with respect to MBA percent for each individual.

Placement Status Analysis

How many of the students were placed?

##   Placement_Status Freq
## 1       Not Placed   67
## 2           Placed  148

Analysis of collective percentages

The density curve below shows the overall percentage distribution at each level of the education.

  • MBA percentage distribution has normal distribution with mean of 62.28 and variance is 34.03.
  • Employability Test percentage distribution has slightly right skewed distribution with mean of 72.1 and variance is 176.25.
  • Degree percentage distribution has normal distribution with mean of 66.37 and variance is 54.15.
  • Higher Education percentage distribution has normal distribution with mean of 66.33 and variance is 118.76.
  • SSC percentage distribution has slightly left distribution with mean of 67.3 and variance is 117.23.

Summary of Percentages

  • Any outliers observed?
  • Decrease in range with increasing education level.

Specialization and Placement Status Analysis

There are a total of 120 students enrolled in Finance and 95 in HR.

Conclusion and Findings:

  • The disparity between the ratio of placed to non-placed students wrt Specialization is evident.
  • With approximately 1:4 ratio for Finance and 7:9 for HR.
  • Students specializing in Finance are more sought after by the hiring committee.
  • There are more companies coming to recruit for Finance.

Work Experience and Placement Status Analysis

There are 141 students who don’t have any prior work experience and 74 who have worked before.

Findings:

  • 2 out of 5 were not placed if student did not have work experience.
  • 1 out of 8 were not placed if student had work experience.
  • A student having work-experience is more likely to be hired.

3-D Scatter Plot

Representation:

  • Axis
    • x-axis: MBA Percentage
      • Range: 51.21 to 77.89
    • y-axis: Degree Percentage
      • Range: 50 to 91
    • z-axis: Employability Percentage
      • Range: 50 to 98
  • Color
    • Black: Not Placed.
    • Grey to Orange: Distribution in salary.

{Degree, MBA}: Students having degree percentage <65 has a higher number of unplaced students
{MBA, Emp Test}: Salaries for students having scores <75,<75 are lower.

Salary Distribution

##      Minimum 1st Quartile       Median         Mean 3rd Quartile          Max 
##       200000       240000       265000       288655       300000       940000

Findings:

  • Salary Distribution is highly right skewed.
  • There are few outliers on the higher end, with upper bound 390K.
  • The bell curve for the male salary distribution is narrower than the female salary distribution.
  • Median salary for female is lower than the male salary.

Central Limit Therom

  • The Central Limit Theorm states that the sampling distribution of the sample means approaches normal distribution as the sampling size increases.
  • This is tested by taking samples from Salaries of different sample sizes
##   Sample.10 Sample.20 Sample.30 Sample.40
## 1    283400    285750  318200.0    319825
## 2    288100    353900  323800.0    320750
## 3    383100    323350  318233.3    301450
## 4    324700    318150  295800.0    272125
## 5    342300    324550  266666.7    289275
## 6    304400    278350  286200.0    283150
## [1] "For Sample Size: 10, Mean: 288631.40, Standard Deviation, 29348.50"
## [2] "For Sample Size: 20, Mean: 288760.49, Standard Deviation, 20972.60"
## [3] "For Sample Size: 30, Mean: 288682.53, Standard Deviation, 17097.72"
## [4] "For Sample Size: 40, Mean: 288570.60, Standard Deviation, 14680.67"

Findings:

  • For different sample sizes, the mean of the samples is same, 288k, and the standard deviation decreases as the sample size is increase.
  • The figure shows distribution of Salaries when 5000 samples are drawn.
  • As the sample size increases the distribution becomes narrower.

Sampling Methods

Sampling methods are applied to analyze a smaller set of the population to derive the pattern and check which method gives the best results similar to that of population. For the Campus Recruitment Project 4 sampling methods are used,

  • Simple Random Sampling Without Replacement.
  • Systematic Sampling.
  • Inclusion Probabilities.
  • Stratified sampling.

Placement of students for overall population is almost 1:2 for not placed to placed.

##            A A- B+ B-  C  D
## Not Placed 1  4 14 20 20  8
## Placed     3 16 31 45 39 14

Simple Random Sampling Without Replacement

##                 Grade
## Placement_Status A A- B+ B- C D
##       Not Placed 0  0  1  0 5 1
##       Placed     1  3  2  3 2 4

Findings:

  • The top 3 grades students were all placed.
  • Grade C has the highest Not placed students, in contrast to population.
  • Doesn’t give information regarding not placed for students in grades A, A- and B-.

Systematic Sampling

##                 Grade
## Placement_Status A A- B+ B- C D
##       Not Placed 0  1  0  3 3 0
##       Placed     1  2  1  5 5 0

Findings:

  • Fails to provide any information of students with D grade.
  • Doesn’t give information regarding not placed students in A & B+ grade.
  • Information for A-, B- and C follow is almost 1:2 ration.

Inclusion Probabilities

##                 Grade
## Placement_Status A A- B+ B- C D
##       Not Placed 0  1  1  0 1 2
##       Placed     0  0  3  6 6 2

Findings:

  • Fails to provide any information of students with A grade.
  • Doesn’t give information regarding placed students in A- grade.
  • Do not provide information regarding not placed students from B- grade.

Stratified Sampling

##                 Grade
## Placement_Status A A- B+ B- C D
##       Not Placed 1  1  2  1 3 3
##       Placed     1  2  1  1 4 1

Findings:

  • Gives information for all the grades, but looks to be left skewed.
  • Equal placement for Grade A and B- is misleading(1:1).
  • Grades B+ and D have information in contrast to population

Decision Tree Classification

Binary classification is performed on the dataset to predict the placement status of the students. Decision Tree Classification Algorithm is chosen for this process.

Initially, the data set was divided into training set and testing set using the sample function. A model is created by fitting the training dataset, which is used on the test data to predict Placement status. The confusion matrix gives the performance metrics. Accuracy, F1-score and Precision for the model are calculated.

Findings - Based on the SSC percentages, HSC percentages, Gender, and Degree percent, the algorithm created an automated decision tree where it shows the probability of students getting placed.

Confusion Matrix

##             Predicted
## Actual       Not Placed Placed Sum
##   Not Placed         13      8  21
##   Placed              1     43  44
##   Sum                14     51  65

Performance Measures

## [1] "Precision for predicting the status of job is: 84.31%"
## [1] "F1 Score for predicting the status of job is: 0.905263"
## [1] "Accuracy for predicting the status of job is: 86.15%"

Conclusion

  • Marketing and Finance has higher recruitment rate.
  • “Not placed” rate is significantly lower for students with prior Work experience.
  • Male student with B- grade got the highest salary.
  • Systematic Sampling is chosen as best sampling method for the given population.
  • The model gives accuracy of 86.15%.